11 research outputs found

    Reactive and Proactive Sharing Across Concurrent Analytical Queries

    Get PDF
    Today an ever increasing amount of data is collected and analyzed by researchers, businesses, and scientists in data warehouses (DW). In addition to the data size, the number of users and applications querying data grows exponentially. The increasing concurrency is itself a challenge in query execution, but also introduces an opportunity favoring synergy between concurrent queries. Traditional execution engines of DW follows a query-centric approach, where each query is optimized and executed independently. On the other hand, workloads with increased concurrency have several queries with common parts of data and work, creating the opportunity for sharing among concurrent queries. Sharing can be reactive to the inherently existing sharing opportunities, or proactive by redesigning query operators to maximize the sharing opportunities. This demonstration showcases the impact of proactive and reactive sharing by comparing and integrating representative state-of-the-art techniques: Simultaneous Pipelining (SP), for reactive sharing, which shares intermediate results of common sub-plans, and Global Query Plans (GQP) for proactive sharing, which build and evaluate a single query plan with shared operators. We visually demonstrate, in an interactive interface, the behavior of both sharing approaches on top of a state-of-the-art storage engine using the original prototypes. We show that pull-based sharing for SP eliminates the serialization point imposed by the original push-based approach. Then, we compare, through a sensitivity analysis, the performance of SP and GQP. Finally, we show that SP can improve the performance of GQP for a query mix with common sub-plans

    Taster: Self-Tuning, Elastic and Online Approximate Query Processing

    Get PDF
    Current Approximate Query Processing (AQP) engines are far from silver-bullet solutions, as they adopt several static design decisions that target specific workloads and deployment scenarios. Offline AQP engines target deployments with large storage budget, and offer substantial performance improvement for predictable workloads, but fail when new query types appear, i.e., due to shifting user interests. To the other extreme, online AQP engines assume that query workloads are unpredictable, and therefore build all samples at query time, without reusing samples (or parts of them) across queries. Clearly, both extremes miss out on different opportunities for optimizing performance and cost. In this paper, we present Taster, a self-tuning, elastic, online AQP engine that synergistically combines the benefits of online and offline AQP. Taster performs online approximation by injecting synopses (samples and sketches) into the query plan, while at the same time it strategically materializes and reuses synopses across queries, and continuously adapts them to changes in the workload and to the available storage resources. Our experimental evaluation shows that Taster adapts to shifting workload and to varying storage budgets, and always matches or significantly outperforms the state-of-the-art performing AQP approaches (online or offline)

    BLOCK: Efficient Execution of Spatial Range Queries in Main-Memory

    Get PDF
    The execution of spatial range queries is at the core of many applications, particularly in the simulation sciences but also in many other domains. Although main memory in desktop and supercomputers alike has grown considerably in recent years, most spatial indexes supporting the efficient execution of range queries are still only optimized for disk access (minimizing disk page reads). Recent research has primarily focused on the optimization of known disk-based approaches for memory (through cache alignment etc.) but has not fundamentally revisited index structures for memory. In this paper we develop BLOCK, a novel approach to execute range queries on spatial data featuring volumetric objects in main memory. Our approach is built on the key insight that in-memory approaches need to be optimized to reduce the number of intersection tests (between objects and query but also in the index structure). Our experimental results show that BLOCK outperforms known in-memory indexes as well as in-memory implementations of disk-based spatial indexes up to a factor of 7. The experiments show that it is more scalable than competing approaches as the data sets become denser

    Taster: Self-tuning, elastic and online approximate query processing

    No full text

    Timely and cost-efficient data exploration through adaptive tuning

    No full text
    Modern applications accumulate data at an exponentially increasing rate and traditional database systems struggle to keep up. Decision support systems used in industry, rely heavily on data analysis, and require real-time responses irrespective of data size. To offer real-time support, traditional databases require long preprocessing steps, such as data loading and offline tuning. Loading transforms raw data into a format that reduces data access cost. Through tuning, database systems build access paths (e.g., indexes) to improve query performance by avoiding or reducing unnecessary data access. The decision on what access paths to build depends on the expected workload, thus, the database system assumes knowledge of future queries. However, decision support systems and data exploration applications have shifting requirements. As a consequence, an offline tuner with no a priori knowledge of the full workload is unable to decide on the optimal set of access paths. Furthermore, access path size increases along with input data, thus, building precise access paths over the entire dataset limits the scalability of databases systems. Apart from long database pre-processing, offering efficient data access despite increasing data volume becomes harder due to hardware architectural constraints such as memory size. To achieve low query latency, modern database systems store data in main memory. However, there is a physical limit on main memory size in a server. Thereby, applications must trade memory space for query efficiency. To provide high performance efficiency, irrespective of dataset growth and query workload, a database system needs to (i) shift the decision of tuning from off-line to query-time, (ii) enable the query engine to exploit application properties in choosing fast access paths, and (iii) reduce the size of access paths to limit storage cost. In this thesis, we present techniques for query processing that are adaptive to workload, application requirements, and available storage resources. Specifically, to address dynamic workloads, we turn access path creation into a continuous process which fully adapts to incoming queries. We assign all decisions on data access and access path materialization to the database optimizer at query time, and enable access path materialization to take place as a by-product of query execution, thereby, removing requirements for long offline tuning processing steps. Furthermore, we take advantage of application characteristics (precision requirements, resource availability) and we design a system which can adaptively trade precision and resources for performance. By combining precise and approximate access paths, the database system reduces query response time and minimizes resource utilization. Approximate access paths (e.g., sketches) require less space in comparison to their precise counterparts, and offer constant access time. By improving data processing performance while reducing storage requirements through (i) adaptive access path materialization and (ii) using approximate and space-efficient access paths when appropriate, our work minimizes data access cost and provides real-time responses for data exploration applications irrespective of data growth

    Scaling the Memory Power Wall with DRAM-Aware Data Management

    No full text
    Improving the energy efficiency of database systems has emerged as an important topic of research over the past few years. While significant attention has been paid to optimizing the power consumption of tradition disk-based databases, little attention has been paid to the growing cost of DRAM power consumption in main-memory databases (MMDB). In this paper, we bridge this divide by examining power– performance tradeoffs involved in designing MMDBs. In doing so, we first show how DRAM will soon emerge as the dominating source of power consumption in emerging MMDB servers unlike traditional database servers, where CPU power consumption overshadows that of DRAM. Second, we show that using DRAM frequency scaling and power-down modes can provide substantial improvement in performance/Watt under both transactional and analytical workloads. This, again contradicts rules of thumb established for traditional servers, where the most energy-efficient configuration is often the one with highest performance. Based on our observations, we argue that the long-overlooked task of optimizing DRAM power consumption should henceforth be considered a first-class citizen in designing MMDBs. In doing so, we highlight several promising research directions and identify key design challenges that must be overcome towards achieving this goal

    Alpine: Efficient In-Situ Data Exploration in the Presence of Updates

    No full text
    The ever growing data collections create the need for brief explorations of the available data to extract relevant information before decision making becomes necessary. In this context of data exploration, current data analysis solutions struggle to quickly pinpoint useful information in data collections. One major reason is that loading data in a DBMS without knowing which part of it will actually be useful is a major bottleneck. To remove this bottleneck, state-of-the art approaches perform queries in situ, thus avoiding the loading overhead. In situ query engines, however, are index-oblivious, and lack sophisticated techniques to reduce the amount of data to be accessed. Furthermore, applications constantly generate fresh data and update the existing raw data files whereas state-of-the art in situ approaches support only append-like workloads. In this demonstration, we showcase the efficiency of adaptive indexing and partitioning techniques for analytical queries in the presence of updates. We demonstrate an online partitioning and indexing tuner for in situ querying which plugs to a query engine and offers support for fast queries over raw data files. We present Alpine, our prototype implementation, which combines the tuner with a query executor incorporating in situ query techniques to provide efficient raw data access. We will visually demonstrate how Alpine incrementally and adaptively builds auxiliary data structures and indexes over raw data files and how it adapts its behavior as a side-effect of updates in the raw data files

    Taster:self-tuning, elastic and online approximate query processing

    No full text
    \u3cp\u3eCurrent Approximate Query Processing (AQP) engines are far from silver-bullet solutions, as they adopt several static design decisions that target specific workloads and deployment scenarios. Offline AQP engines target deployments with large storage budget, and offer substantial performance improvement for predictable workloads, but fail when new query types appear, i.e., due to shifting user interests. To the other extreme, online AQP engines assume that query workloads are unpredictable, and therefore build all samples at query time, without reusing samples (or parts of them) across queries. Clearly, both extremes miss out on different opportunities for optimizing performance and cost. In this paper, we present Taster, a self-tuning, elastic, online AQP engine that synergistically combines the benefits of online and offline AQP. Taster performs online approximation by injecting synopses (samples and sketches) into the query plan, while at the same time it strategically materializes and reuses synopses across queries, and continuously adapts them to changes in the workload and to the available storage resources. Our experimental evaluation shows that Taster adapts to shifting workload and to varying storage budgets, and always matches or significantly outperforms the state-of-the-art performing AQP approaches (online or offline).\u3c/p\u3

    Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing

    No full text
    The constant flux of data and queries alike has been pushing the boundaries of data analysis systems. The increasing size of raw data files has made data loading an expensive operation that delays the data-to-insight time. Hence, recent in-situ query processing systems operate directly over raw data, alleviating the loading cost. At the same time, analytical workloads have increasing number of queries. Typically, each query focuses on a constantly shifting -- yet small -- range. Minimizing the workload latency, now, requires the benefits of indexing in in-situ query processing. In this paper, we present Slalom, an in-situ query engine that accommodates workload shifts by monitoring user access patterns. Slalom makes on-the-fly partitioning and indexing decisions, based on information collected by lightweight monitoring. Slalom has two key components: (i) an online partitioning and indexing scheme, and (ii) a partitioning and indexing tuner tailored for in-situ query engines. When compared to the state of the art, Slalom offers performance benefits by taking into account user query patterns to (a) logically partition raw data files and (b) build for each partition lightweight partition-specific indexes. Due to its lightweight and adaptive nature, Slalom achieves efficient accesses to raw data with minimal memory consumption. Our experimentation with both micro-benchmarks and real-life workloads shows that Slalom outperforms state-of-the-art in-situ engines (3 -- 10×), and achieves comparable query response times with fully indexed DBMS, offering much lower (∌ 3×) cumulative query execution times for query workloads with increasing size and unpredictable access patterns

    A parallel and distributed approach for diversified top-k best region search

    No full text
    Given a set of points, the Best Region Search problem finds the optimal location of a rectangle of a specified size such that the value of a user-defined scoring function over its enclosed points is maximized. A recently proposed top-k algorithm for this problem returns results progressively, while also incorporating additional constraints, such as taking into consideration the overlap between the set of selected top-k rectangles. However, the algorithm is designed for a centralized setting and does not scale to very large datasets. In this paper, we overcome this limitation by enabling parallel and distributed computation of the results. We first propose a strategy that employs multiple rounds to progressively collect partial top-k results from each node in the cluster, while a coordinator handles the aggregation of the global top-k list, dealing with overlapping results. We then devise a single-round strategy, where the algorithm executed by each node is enhanced with additional conditions that anticipate potential overlapping solutions from neighboring nodes. Additional optimizations are proposed to further increase performance. Our experiments on real-world datasets indicate that our proposed algorithms are efficient and scale to millions of points
    corecore